Skip to content

[WIP] Update Dsv4 B300 configs#1656

Open
wzhao18 wants to merge 6 commits into
mainfrom
wzhao/dsv4-b300-nvfp4
Open

[WIP] Update Dsv4 B300 configs#1656
wzhao18 wants to merge 6 commits into
mainfrom
wzhao/dsv4-b300-nvfp4

Conversation

@wzhao18

@wzhao18 wzhao18 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Note

Low Risk
Benchmark and runner config only; no auth, serving API, or production inference path changes.

Overview
Switches the dsv4-fp4-b300-vllm recipe to the nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint instead of the upstream deepseek-ai/DeepSeek-V4-Pro weights, and registers DeepSeek-V4-Pro-NVFP4 in the B300 runner’s pre-staged model list so jobs can load from scratch without a fresh HF pull.

The single-node benchmark script no longer passes --moe-backend deep_gemm_mega_moe when data-parallel attention is enabled—that block is commented out, so DP layouts run with default MoE backend behavior for this NVFP4 setup.

Reviewed by Cursor Bugbot for commit 41630f2. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml Outdated
Comment on lines +3434 to +3438
- config-keys:
- dsv4-fp4-b300-vllm
description:
- "Update B300 dsv4 image to nvfp4"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Two issues in the new perf-changelog entry for dsv4-fp4-b300-vllm: (1) The pr-link references #1652, but that's the MiniMax-M2.5 PR (the entry immediately above) — this is a copy-paste, the link should point to this PR (#1656). (2) The description says "Update B300 dsv4 image to nvfp4", but the diff only changes the model field; the image (vllm/vllm-openai:v0.22.0) is unchanged, so it should say something like "Update B300 dsv4 model to NVFP4 checkpoint".

Extended reasoning...

Bug 1: Wrong PR link

The new changelog entry at line 3438 sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652. However, PR #1652 is [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs (already merged at commit 7d4063d) — a completely different PR. The current PR is #1656 ("[WIP] Update Dsv4 B300 configs"). The entry immediately above (line 3432, for minimaxm2.5-fp4-b300-dynamo-vllm) correctly references #1652, so this is almost certainly a copy-paste error from the adjacent entry.

Bug 2: Misleading description ("image" vs "model")

The description says "Update B300 dsv4 image to nvfp4". But the diff in .github/configs/nvidia-master.yaml only changes the model field:

-  model: deepseek-ai/DeepSeek-V4-Pro
+  model: nvidia/DeepSeek-V4-Pro-NVFP4

The image field (vllm/vllm-openai:v0.22.0) is unchanged. In perf-changelog.yaml "image" consistently refers to the Docker container image — e.g. line 3430 of the same file says "Image: vllm/vllm-openai:v0.20.1". The PR's own Cursor Bugbot summary even acknowledges this: "Image, search space, and other recipe fields are unchanged." Calling the model swap an "image" change is misleading.

Step-by-step proof

  1. The PR metadata shows this is PR [WIP] Update Dsv4 B300 configs #1656, titled "[WIP] Update Dsv4 B300 configs".
  2. Recent commit history (7d4063d) confirms PR [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs #1652 is a separate, already-merged PR titled "[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs".
  3. In perf-changelog.yaml, the entry at line 3432 (the MiniMax-M2.5 entry) legitimately uses pr-link: .../pull/1652.
  4. The newly added entry at line 3438 (this PR's entry) uses the same pr-link: .../pull/1652 — duplicating the link from the line above instead of referencing [WIP] Update Dsv4 B300 configs #1656.
  5. Reading the diff in .github/configs/nvidia-master.yaml, only model: changes; image: stays at vllm/vllm-openai:v0.22.0. So "Update B300 dsv4 image to nvfp4" describes a change that did not happen, and omits the change that did.

Suggested fix

- config-keys:
    - dsv4-fp4-b300-vllm
  description:
    - "Update B300 dsv4 model to NVFP4 checkpoint (nvidia/DeepSeek-V4-Pro-NVFP4)"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1656

Impact

Documentation-only — no runtime behavior is affected. But the changelog is the canonical record for tracing perf changes back to their PR, and the incorrect link points readers to an unrelated PR with no context for this model swap.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 41630f2. Configure here.

fi
# if [ "${DP_ATTENTION}" = "true" ]; then
# MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
# fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DP-attn megamoe backend disabled

Medium Severity

With DP_ATTENTION=true, the script no longer passes --moe-backend deep_gemm_mega_moe, but dsv4-fp4-b300-vllm still schedules high-concurrency dp-attn/ep points. That diverges from the prior B300 pareto recipe and from dsv4_fp4_b300_vllm_mtp.sh / B200 vLLM siblings, so those runs may not match the intended serving path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 41630f2. Configure here.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@wzhao18 wzhao18 marked this pull request as draft June 5, 2026 16:06
@wzhao18 wzhao18 marked this pull request as ready for review June 8, 2026 20:10
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @wzhao18's task in 2m 7s —— View job


Review of PR #1656

  • Gather context and read changed files
  • Validate master config changes
  • Validate perf-changelog entry
  • Validate benchmark script
  • Validate runner launch script
  • Post review summary

Summary

Config and runner changes look correct — the model swap to nvidia/DeepSeek-V4-Pro-NVFP4, the pre-staged model list addition, and the --gpu-memory-utilization 0.97 bump are all straightforward.

One issue found:

  • 🟡 perf-changelog.yaml: Description says "Update B300 dsv4 image to nvfp4" but the image (vllm/vllm-openai:v0.22.0) is unchanged — only the model checkpoint changed. This was flagged in the earlier review (the PR-link was corrected to [WIP] Update Dsv4 B300 configs #1656 but the description was not updated). The description also omits the --gpu-memory-utilization 0.97 addition and the MoE backend change. See inline comment with a suggested fix.

No blocking issues found. The rest of the changes pass validation: public Docker Hub image, correct EP conditional pattern, server launch formatted with line continuations, perf-changelog appended at end with correct PR link, and valid model-prefix.

Comment thread perf-changelog.yaml
- config-keys:
- dsv4-fp4-b300-vllm
description:
- "Update B300 dsv4 image to nvfp4"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING: Misleading description — says "image" but only the model field changed

Why it matters: The perf-changelog is the canonical record for tracing config changes. "image" in this file consistently refers to the Docker container image (e.g., vllm/vllm-openai:v0.22.0), which is unchanged here. The actual change is the model checkpoint swap to nvidia/DeepSeek-V4-Pro-NVFP4, plus the --gpu-memory-utilization 0.97 addition and MoE backend change. This was flagged in the earlier review (the PR-link was fixed but the description was not).

Fix:

Suggested change
- "Update B300 dsv4 image to nvfp4"
- "Switch B300 dsv4 model to nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint; bump gpu-memory-utilization to 0.97; disable deep_gemm_mega_moe backend for dp-attn"

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant